Skip to content

[SVLS-8979] Add CloudFormation template for Lambda Durable Function event forwarder#1149

Open
lym953 wants to merge 9 commits into
masterfrom
yiming.luo/durable-event-forwarder
Open

[SVLS-8979] Add CloudFormation template for Lambda Durable Function event forwarder#1149
lym953 wants to merge 9 commits into
masterfrom
yiming.luo/durable-event-forwarder

Conversation

@lym953

@lym953 lym953 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Context

To capture TIMED_OUT and STOPPED status of Lambda durable function executions, we need to capture the status change events in EventBridge and forward them to Datadog. This involves three changes:

  1. AWS client-side integrations: EventBridge rule to get the logs, Firehose to batch send the logs to Datadog. The CloudFormation template will be published to S3 so customers can install it by 1 click, in the same way as installing Datadog forwarder. (This PR)
  2. Logs pipeline on Datadog side: transforms the logs (https://github.com/DataDog/integrations-internal-core/pull/3389)
  3. Datadog's Lambda UI: consume the logs

Architecture

image

This is Option 4.3 in the design doc. See the doc for why we need to capture the status change events.

Changes

  • Add a CloudFormation template for the AWS-side resources
  • Add README.md

Params of the CloudFormation template

  • It supports three ways to set DD API key:
    • Plaintext
    • Secrets Manager secret ARN
    • SSM SecureString parameter name
  • Datadog site, defaults to datadoghq.com
  • Statuses to forward, defaults to empty, i.e. forward all statuses: RUNNING,SUCCEEDED,FAILED,TIMED_OUT,STOPPED.
  • Function ARN filters. Can be either the unqualified ARN for a single function, e.g. arn:aws:lambda:us-east-2:425362996713:function:my-durable-function, or a wildcard pattern, e.g. arn:aws:lambda:us-east-2:425362996713:function:my-durable-*.
    • We support up to 5 filters. If there's a ask, we can consider adding more. However, I expect most customers to leave them empty so the stack covers all the durable functions in the region.
  • Firehose buffer interval, defaults to 60 seconds. This is the interval at which events are sent to Datadog.

Next steps

  • Enable releasing to Datadog Prod account

Test plan

Steps

  • Upload the CloudFormation template to S3 in Datadog Serverless Sandbox by running:
  aws-vault exec sso-serverless-sandbox-account-admin -- \
    aws s3 cp template.yaml \
    s3://datadog-cloudformation-template-serverless-sandbox/aws/lambda-durable-function-event-forwarder/0.0.1.yaml \
    --content-type text/yaml

  aws-vault exec sso-serverless-sandbox-account-admin -- \
    aws s3 cp template.yaml \
    s3://datadog-cloudformation-template-serverless-sandbox/aws/lambda-durable-function-event-forwarder/latest.yaml \
    --content-type text/yaml

Result

After a few minutes, the logs appeared in Datadog.
image

Captures AWS Lambda Durable Function execution status change events from
EventBridge and delivers them to the Datadog HTTP intake via Amazon Data
Firehose. Records arrive at Datadog as the raw EventBridge envelope;
reshaping (ARN qualifier stripping, detail.* flattening, ISO timestamp
parsing) is configured on the Datadog side via a logs processing
pipeline rather than inside the stack.

Resources created (9): S3 backup bucket + policy, Firehose delivery
stream + role + log group + 2 log streams, EventBridge rule + role.
When DdApiKeyKmsCiphertext is set, four additional resources are
provisioned to decrypt the API key at deploy time via a custom resource
(IAM role, Lambda, log group, Custom::DatadogApiKeyKmsDecrypt).

Four mutually-acceptable API-key options:
- DdApiKey (plaintext, NoEcho)
- DdApiKeySecretArn (Secrets Manager dynamic reference)
- DdApiKeySsmParameterName (SSM SecureString dynamic reference)
- DdApiKeyKmsCiphertext + DdApiKeyKmsKeyArn (deploy-time decrypter)

Five independent function-name filter slots (FunctionNameFilter1..5),
each producing two matchers — unqualified ARN + version/alias-qualified
ARN — so events for any qualifier form are captured. Empty slots are
stripped from the EventBridge rule at deploy time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the aws label Jun 5, 2026
@lym953 lym953 changed the title Add CloudFormation template for Lambda Durable Function event forwarder [SVLS-8979] Add CloudFormation template for Lambda Durable Function event forwarder Jun 5, 2026
lym953 and others added 7 commits June 11, 2026 14:20
Publishes template.yaml to the public datadog-cloudformation-template bucket at
aws/lambda-durable-function-event-forwarder/<version>.yaml (+ latest.yaml),
authenticating to the Datadog Prod account (464622532012) via the
prod-engineering role. Requires a semver version arg, validates the template,
and refuses to overwrite an already-published version. README publishing
section updated to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- FunctionNameFilter{1..5} -> FunctionArnFilter{1..5}: users now supply an
  unqualified function ARN (or wildcard over one) instead of a bare name.
  We append ":*" since detail.functionArn is always version/alias-qualified,
  and an AllowedPattern rejects a pasted qualified ARN at deploy time.
- Statuses now defaults to "" (forward all). The status key is dropped from
  the EventPattern when empty, and the whole detail block is omitted when
  neither a status nor a function filter is set (empty detail:{} is invalid).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove the DdApiKeyKmsCiphertext / DdApiKeyKmsKeyArn key path and its four
  deploy-time decrypter resources (Lambda + role + log group + custom
  resource). It was carried over from the Lambda forwarder's DD_KMS_API_KEY
  pattern, but that forwarder already has a runtime Lambda; here it meant a
  whole custom-resource decrypter for the least-secure of the options. The
  two dynamic-reference paths (Secrets Manager, SSM SecureString) cover the
  "keep plaintext out of the template" need more securely at zero resource
  cost. API key is now one of three. Also drops the now-unneeded W1030
  cfn-lint suppression.
- Trim verbose parameter descriptions; list valid Statuses values
  (RUNNING/SUCCEEDED/FAILED/TIMED_OUT/STOPPED); replace em-dashes with ASCII
  so they render correctly in the CloudFormation console.
- release.sh: validate with local cfn-lint instead of
  cloudformation:ValidateTemplate (the publishing role is scoped to S3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move implementation detail out of customer-facing parameter Descriptions:
drop the Firehose URL derivation from DdSite, the service-taxonomy guidance
from DdService, and the status-matching/EventBridge-rule notes from Statuses.
Preserve the one fact not otherwise self-documenting (the API key becomes the
X-Amz-Firehose-Access-Key header; dynamic references keep plaintext out of the
template) as a comment at the AccessKey field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Append "(Optional)" to the Event filters parameter group and "(optional)"
to the Statuses label, matching the Lambda forwarder's labeling and the
sibling FunctionArnFilter labels, so the console makes clear nothing in the
section is required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Publishing is handled in a separate PR. Remove the release.sh usage steps and
release checklist from the README and the release.sh row from the Files table,
keeping the published-URL pattern and quick-create link. (The release.sh file
itself was removed in the previous commit.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DdService, DdEnv, DdVersion, and DdTags were declared but never wired to
anything (the Firehose intake can't carry them as proper facets, so the
stack transmitted nothing). They misled users into thinking they set tags.
Remove them and the dead HasEnv/HasVersion/HasTags conditions; DdSite stays
(it builds the Firehose URL). Service/env/version/tags are set in the
Datadog log processing pipeline instead. cfn-lint is now warning-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lym953 lym953 requested review from a team and jchrostek-dd June 12, 2026 18:38
- Cut the Publishing/Deploying/nested-stack/Filtering/Files/Notes sections;
  the Datadog-side pipeline section now just says to install the AWS Lambda
  integration (its OOTB logs pipeline is provisioned automatically).
- Correct the example payload to the field names from AWS's "Monitoring
  durable functions" doc (durableExecutionArn, durableExecutionName,
  functionArn, status, startTimestamp; endTimestamp for terminal states) and
  link the doc. The previous executionName/executionStartTime/executionEndTime
  fields were inaccurate.
- Remove the speculative auto-tag enumeration / integration attribution; state
  only that tagging and reshaping happen on the Datadog side.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lym953 lym953 requested review from a team, Copilot and nina9753 June 12, 2026 19:03
@lym953 lym953 marked this pull request as ready for review June 12, 2026 19:04
@lym953 lym953 requested a review from a team as a code owner June 12, 2026 19:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an installable, self-contained CloudFormation stack that captures AWS Lambda Durable Function execution status-change events from EventBridge and forwards them to Datadog via a Kinesis Data Firehose HTTP endpoint, with an S3 bucket as a failed-records backup.

Changes:

  • Introduces a new CloudFormation template that provisions EventBridge rule, Firehose delivery stream, IAM roles, and an S3 backup bucket.
  • Documents architecture, parameters, outputs, and the forwarded EventBridge event shape in a new README.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
aws/durable_function_event_forwarder/template.yaml Provisions EventBridge → Firehose → Datadog intake pipeline plus S3 failed-records backup.
aws/durable_function_event_forwarder/README.md Explains deployment parameters, outputs, and the raw EventBridge envelope forwarded to Datadog.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +273 to +277
- Effect: Allow
Action:
- logs:PutLogEvents
Resource:
- !GetAtt FirehoseLogGroup.Arn
Version: "0.1.0"

Parameters:
# ---- Datadog API key (exactly one of the three is required) ----
Comment on lines +292 to +296
# The API key becomes the X-Amz-Firehose-Access-Key header on each
# request and is stored opaquely by Firehose. The two dynamic-
# reference paths resolve the value straight into this resource at
# deploy time, so the plaintext never appears in the template source,
# the stack parameters, or stack events.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants